Distinguishing Distributions with Interpretable Features
نویسندگان
چکیده
•Two semimetrics, ME and SCF, on distributions are based on the differences of analytic functions evaluated at spatial or frequency locations (i.e., features). •Proposal: choose the features so as to maximize the distinguishability of the distributions, by optimizing a lower bound on test power for a statistical test using these features. •Result: powerful, linear-time, nonparametric, interpretable two-sample test. Performance comparable to the quadratic-time MMD test.
منابع مشابه
Interpretable Distribution Features with Maximum Testing Power
Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i.e, features). The features are chosen so as to maximize the distinguishability of the distributions, by optimizing a lower bound on test power for a statistical test using these features. The result is a parsimonious and ...
متن کاملMind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction
We present the Mind the Gap Model (MGM), an approach for interpretable feature extraction and selection. By placing interpretability criteria directly into the model, we allow for the model to both optimize parameters related to interpretability and to directly report a global set of distinguishable dimensions to assist with further data exploration and hypothesis generation. MGM extracts disti...
متن کاملComparsion Between Several Distributions of Exponential Family and Offering Their Features and Applications
In this paper, first, we investigate probability density function and the failure rate function of some families of exponential distributions. Then we present their features such as expectation, variance, moments and maximum likelihood estimation and we identify the most flexible distributions according to the figure of probability density function and the failure rate function and f...
متن کاملBimodal Gene Expression and Biomarker Discovery
With insights gained through molecular profiling, cancer is recognized as a heterogeneous disease with distinct subtypes and outcomes that can be predicted by a limited number of biomarkers. Statistical methods such as supervised classification and machine learning identify distinguishing features associated with disease subtype but are not necessarily clear or interpretable on a biological lev...
متن کاملSearching for Important Amino Acids in DNA-binding Proteins for Histogram Methods
We develop a method capable to identify important amino acids for histogram-based methods predicting DNA-binding propensity. This method can be used both for prediction from sequence information (Tube Histograms) and prediction from structural information (Ball Histograms). We validate our method in prediction experiments using only proteins’ primary structure, achieving favourable accuracies. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016